Digging into Connected Repositories (DiggiCORE)
Contents
- Statement of significance
- List of Participants
- Narrative
- Project objectives and research questions
- The Necessity of Using Large Datasets
- DiggiCORE Partnership
- Data Sources in DiggiCORE
- Existing Research and Legacy Technology Available to DiggiCORE
- Technology and Methodology to be used by DiggiCORE
- Standards used in DiggiCORE
- Training of Graduate Students and Novice Researchers
- Related work
- Risk Management and RiskSmitigation
- Compliance with Ethical Principles
- References
Statement of significance
In every discipline of the sciences or humanities there are publications and people who contribute to shaping that discipline. These publications are recognised as having significant impact and their authors are held in high esteem by their communities. Other community members follow the impact publications as an important source of knowledge, use them in developing their own contribution to the discipline and cite them in their own papers. In the large collections of publications that exist online, it is important not only to provide access to the impact publications but also to make explicit the criteria for why the publication has high impact. Similarly, it is important to measure the productivity and impact of authors, because their professional career and funding depends on it.
There are many approaches to measuring the impact of publications, journals and the productivity of academics (Garfield, 2006). They are typically based only on citations, which has many advantages but also significant shortcomings. For example, the criteria used to evaluate impact do not take into account the real content of the publication; the relationship between the publication and the cited literature rests on the author. Moreover, such impact calculations involve a time delay for the rest of the community to react. The recent development in information technologies makes it possible to advance beyond the current state of the art in identifying key contributions and contributors to the discipline. Firstly, there are effective methods for measuring semantic relatedness, especially the similarity of publications (Knoth et al., 2010). These measures allow us to cluster related publications and associate the clusters with disciplines. Secondly, the system of publications and their authors can be perceived and represented as a large social network, which, in principle, does not differ from something like Facebook or Twitter. This network can be analysed and the key information flows can be identified. DiggiCORE aims at combining the methods of semantic analysis of text with techniques for analysing social networks to produce a novel approach to understanding, accessing and following key information sources, to identify trends within disciplines and to detect cross-fertilization between them.
In practical terms, the goal of DiggiCORE is to analyse a vast set of research publications from the Open Access domain using natural language processing and social network analysis methods to identify patterns in the behaviour of research communities, to recognise trends in research disciplines, to learn new insights about the citation behaviours of researchers and to discover features that distinguish papers with high impact. The knowledge acquired from this analysis will significantly influence other disciplines. For example, it will enable the development of better methods for exploratory search and browsing in digital collections or new ways of evaluating research or the researcher’s impact.
To enable the analysis, the DiggiCORE project will develop a new software infrastructure that will provide access to well-structured and organised information that will be acquired by harvesting, cleaning, integrating and processing information from a very large and fast-growing collection of millions of research publications distributed across more than 1,800 Open Access repositories and Open Access journals. The DiggiCORE infrastructure will be freely accessible to the public via a set of web services.
The questions that the DiggiCORE project aims at answering include: What are the attributes of impact publications? Do these attributes differ in the humanities, social sciences and computer sciences? What are the features of research groups within disciplines and how do these features relate to contributions generated by the group? What are the attributes of high-impact authors and what is their role within the group? What are the dynamics of successful research groups? What is the mechanism of cross-fertilisation within disciplines, especially between the humanities and the sciences? Who are the authors whose work is worth monitoring because they contribute to the achievements of their own discipline and also inspire other disciplines? How should the novice in the discipline get acquainted with key achievements in the discipline? How should he/she search for the most important publications?
List of Participants
Project partners
The Open University, Knowledge Media Institute, (KMI/OU)
The European Library/Koninklijke Bibliotheek (KB/TEL)
Key personnel
Edwards Louise, The European Library/Koninklijke Bibliotheek
Juffinger Andreas, The European Library/Koninklijke Bibliotheek
Knoth Petr, Knowledge Media Institute, The Open University
Stephens Owen, Consultant, The Open University
Wolf Annika, Knowledge Media Institute, The Open University
Zdrahal Zdenek, Knowledge Media Institute, The Open University
Advisory Board
Molendijk Jan, Technical and Operations Director, Europeana Foundation
Scantlebury Non, Head of Faculty Team, Library Services, The Open University
van Wesenbeeck Astrid, Executive Director, SPARC Europe
Authors of letters of commitment
Kirkpatricke Denise Professor, Pro Vice-Chancellor Learning, The Open University
Edwards Louise, Head of The European Library
Narrative
Project objectives and research questions
Objectives
At a strategic level, DiggiCORE has two main objectives:
- To develop a new software infrastructure that will enable the exploration and the analysis of very large and fast-growing amounts of research publications (millions of records) stored across Open Access Repositories (OAR) and Open Access journals worldwide.
- To analyse the available information using both text mining (natural language processing - NLP) and social network analysis methods with the goals of identifying patterns in the behaviour of research communities, detecting trends in research disciplines, gaining new insights into the citation behaviour of researchers and discovering features that distinguish papers with high impact.
Expected outputs
The project will generate the following outputs:
- A software infrastructure delivered to users as a free web-service (API) that will enable the analysis of the behaviour of research communities in the Open Access domain. The web service will operate on top of a large Linked Data repository that will be created within the project and will contain wellstructured information acquired during the data analysis.
- New knowledge and understanding resulting from the data analysis, novel methods for exploiting the new knowledge to identify important individuals and research groups, domain trends and publications with high impact.
Users
DiggiCORE users will be researchers from various disciplines who can access the DiggiCORE services directly using the API and/or libraries and research institutions that integrate the services within their existing infrastructure. DiggiCORE will allow the user to investigate the relationships between the impact of publications, citation patterns and the role of the author within the discipline. Users will be able to compare these relationships across disciplines and in time. They will be able to compare various impact patterns for different disciplines in the arts and humanities, social sciences and computer sciences because it is known that the relationships between the impact of publication and other factors differ. Finally, the project will make it possible to measure coverage of individual disciplines in Open Access publishing.
Research challenges
In order to achieve these strategic objectives, the project will address the following research problems:
- Mining semantic relations from full text using NLP methods.
- Clustering content based on different semantic relations. This includes identifying centres of gravity in each cluster and associating research areas/topics within each cluster.
- Developing citation networks that interconnect publications across repositories and publication databases (digital libraries) and analysing relationships between structures in the citation network and the semantic similarity of resources.
- Using methods for the representation and analysis of social networks, developing a citation network of authors. By comparing the semantics of topics, the position of publications within the topic clusters, it is possible to identify roles of authors in the associated research community.
- Developing formal methods for analysing the impact of publications and authors.
- Mining trends as temporal relations of topics and within topics and analysing inter-topic relationships.
- Mapping new results to the existing ontologies and metadata standards, extending these ontologies and publishing results as Linked Open Data.
Technical challenges
In order to enable data analysis, DiggiCORE will develop a new infrastructure for data mining from research publications. This will require performing the following tasks:
- Establishing an infrastructure for metadata and content harvesting from Open Access Repositories and Open Access journals. Although approximately 70% of repositories subscribe to the use of Open Archives Initiative Protocol for Metadata Harvesting (OAIPMH) as a standard, there are still problems, due to different versions of the protocol and the various deviations from the standard. Repositories that use another protocol for metadata harvesting still exist (~30%) and based on the benefit/cost ratio the decision must be made whether to include some of them in the DiggiCORE project.
- Cleaning metadata. Even if the content is harvested from the repositories that use OAI-PMH protocol, there are many variations on how the repositories use the protocol. For further processing, it is necessary to disambiguate metadata and convert them to a normalized format.
- Extracting and cleaning full-text documents. About 30% of metadata extracted from OARs contain a reference to the full-text paper, typically in PDF format. If available, DiggiCORE will extract the full text and carry out additional cleaning. In a small percentage of cases, the PDF files are not machine-readable (they are images). In other cases, the file is readable but contains only an error message.
- Integrating the developed technology with the existing infrastructure of CORE and The European Library.
- Developing an infrastructure for publishing DiggiCORE results as Linked Open Data.
- Developing DiggiCORE web services for users to support the access and exploration of the publications, research topics and the network of authors.
The Necessity of Using Large Datasets
The number of publications available from Open Access Repositories and digital libraries has exploded in recent years. The quantitative characteristics are shown in Section 4d. Publications cover practically all scientific/cultural disciplines and provide an excellent source of information, for scientists and for the general public.
There are two main reasons to use large datasets in DiggiCORE:
- Large collections open new opportunities to deepen understanding of the social behaviour of research communities and to develop new methods for recognising important individuals and trends in various research disciplines. The interesting data patterns cannot be observed on smaller document sets, because:
- Acquiring citation structures from small datasets would not enable the discovery of publications and/or authors that facilitate transfer of ideas between disciplines. It has been found that these bridges are often associated with researchers or publications with high contributions.
- Analysing social networks of researchers and citation graphs on a smaller scale would not allow the production of conclusive results as it is known that the behaviour often differs across disciplines. Large-scale analysis will enable us to find patterns that can be generalised across all research areas as well as patterns that are domain-specific.
- The ability to analyse vast amounts of Open Access publications (which cannot be done easily at the moment as the resources are distributed) will enable us to compare citation behaviour in the Open Access domain with traditional approaches to publishing. This will provide evidence to support innovative publishing approaches.
- Large document repositories exist and it is therefore highly desirable to develop new, scalable methods that will improve access to the digital content. The sheer volume of information sources makes it difficult currently to discover key research publications and authors, based solely on keyword search. The combination of text-mining and social network analysis will stimulate new methods for exploratory search and the discovery of high-impact individuals and research documents.
DiggiCORE Partnership
The DiggiCORE consortium consists of two partner organisations: The Open University (UK), and The European Library (NL).
The group at the Knowledge Media Institute (KMi) of the Open University has a long and successful record of research in the area of text mining, information retrieval, analysis and the Semantic Web. The research has focused on:
- Mining semantic relations from large repositories of textual documents including Open Access repositories
- Text mining and organising multilingual content in various domains including genetics, encyclopaedias and other areas of education.
- Analysing and mining large datasets describing the behaviour of more than 200,000 students to discover new patterns characteristic for students at risk with the aim to improve their retention.
- Exposing information in the Linked Data format.
- Discovering important patterns from textual description of objects and events in the area of cultural heritage. Organising representations of cultural objects into narratives, publishing narratives as Linked Data, support for the navigation across narrative spaces in museum collections.
- Semantic analysis of large patent databases to support innovation processes - mining key components of patent structure.
This research has been carried out as a part of a number of European and national projects. The most recent are: CORE, RETAIN (JISC funded), Eurogene, Tech-IT-Easy, Decipher, Bletchley Park Text (EU funded).
The European Library, based in the Koninklijke Bibliotheek (National Library of the Netherlands), offers free access to the resources of 48 national libraries. The team at The European Library has a long record of metadata harvesting, based on various access protocols and metadata cleaning. These activities are laborious, require experience and skills and are very important for further information processing. The current effort at The European Library includes the processing of 25 million digitised pages of newspapers, with about half of these articles coming from the 20th century. The documents have been scanned and digitized using Optical Character Recoginition and therefore it is reasonable to assume imperfect conversion to machine-readable text. The natural language processing methods developed in DiggiCORE will be a very useful instrument for comparing documents and extracting trends and other temporal properties from these documents. It is expected that processing of this large collection will be a part of the DiggiCORE exploitation plan.
The European Library’s excellent exploitation record also includes integrating innovative approaches in library services and reaching an extensive user community. We are certain that DiggiCORE exploitation will benefit from The European Library’s experience and contacts.
The knowledge and skills of both partners complement each other, and together provide complete support for achieving the DiggiCORE objectives described above.
Data Sources in DiggiCORE
The data sources for DiggiCORE will be documents available in Open Access Repositories. In recent years, the number of repositories and documents has been growing quickly. As of June 2011, there were 1,863 repositories, containing over 28 million documents. Growth since 2004 is shown in Fig. 1.

Figure 1. Growths of repositories and documents (published with the kind permission of BASE)
Records in repositories contain metadata and can be accessed via OAI-PMH API. About 70% of repositories use the OAI-PMH protocol and the remaining 30% use other protocols. It is estimated that 82% of all documents are in English. The remaining 18% are published in other languages, notably French, German and Spanish. Metadata does not include the full text of the document but may contain its reference. Out of all OAI-PMH repositories, about 30% contain a URL referencing the document. This must be downloaded using a standard web request.
Individual repositories can be accessed through directories of Open Access Repositories, such as OpenDOAR (http://www.opendoar.org/) or the directory of open access journals, DOAJ (http://www.doaj.org/).
All documents in Open Access Repositories are freely available on the web. Authors’ rights to documents are covered by various forms of IPR agreements; most often by Creative Commons. However, due to the Open Access policy, IPR is not an issue in DiggiCORE. The results of DiggiCORE will be published under Creative Commons as a Linked Data point.
Existing Research and Legacy Technology Available to DiggiCORE
The project will benefit from the existing research and development done by both partners. The Open University will build on the current JISC-supported Connecting Repositories (CORE) project (February 2011–July 2011), which harvests content from British Open Access Repositories, calculating selected semantic relations from full-text documents and publishing results as Linked Open Data. Users, individuals or libraries, will be provided with an API that can be used to integrate the CORE web service within their infrastructure or application.
The European Library provides infrastructure and software tools for searching a collection of multilingual scientific and cultural digital resources from 48 national libraries in Europe. This content is accessible via The European Library’s unified ingestion manager (UIM) framework, which KB/TEL also provides to the Europeana project. In addition, The European Library can access approximately 2.9 million Open Access scientific publications (about 400,000 from DOAJ and 2.5 million from DRIVER). The European Library’s experience in integrating new tools with existing infrastructures and their exploitation potential is crucial for DiggiCORE.
Technology and Methodology to be used by DiggiCORE

Figure 2. Three networks to be constructed in DiggiCORE: (a) semantically related papers. (b) citation network (c) author citation network.
The project work will be organised into 4 work packages. Work packages 1 and 2 are further divided into specific tasks:
Content harvesting and data preparation
Software infrastructure for harvesting
For metadata harvesting, we will adapt the existing software developed in the CORE project and the software used by The European Library. The main focus will be on using the OAI-PMH protocol, which is widely used by the Open Access community. Special attention will be paid to non-standard use of the OAI-PMH protocol in repositories. Though about 70% of repositories use OAI-PMH, other access protocols will be also investigated and the cost/benefit ratio of their integration will be evaluated. It is expected that the movement towards standardization of the OAI-PMH protocol will continue and therefore DiggiCORE will follow this process and make sure that the harvesting technology is compatible.
Harvesting and cleaning content
Metadata will be cleaned and normalised, e.g. the authors’ names, paper titles and citations will be converted into a standard form. For this process, DiggiCORE will adapt The European Library’s UIM framework and software developed in CORE. Harvested and cleaned metadata will be used to download and clean full-text documents. These documents are typically in PDF format and the objective is to convert them into plain text. The text will then be parsed to create a structured representation of the articles, i.e. to recognise and divide individual sections in the document, identify individual citations from reference sections etc.
Digging into Data
Semantic relations, themes and clusters
The objective of this task is to discover selected types of cross-document semantic relations from the documents’ full texts (Figure 2a). DiggiCORE will employ text mining methods for the measuring of semantic similarity of text and will also investigate the possibility of recognising relations defined in the Cross-document Structure Theory (CST) including follow-up, attribution, contradiction, equivalence and subsumption. The software developed in CORE will be extended to discover these relationships from the text. Based on semantic relatedness/similarity, documents will also be clustered and each cluster will be labelled/associated with a discipline of science or the humanities. As each research publication typically contains the year in which it was published, this approach will enable the discovery and monitoring of trends in research and also find new emerging fields. By combining the text mining techniques with the social network analysis methods, it will become possible to recognise researchers in these emerging research disciplines and to identify people who serve as cross-fertilisers between disciplines.
Constructing research and citation networks
Two networks (Figure 2b-c) will be constructed as a result of the metadata and full-text analysis. The first one, citation network, will be built by connecting publications (i.e. using the relation Publication_1 cites Publication_2). This network can be represented as an oriented graph with publications represented as vertices. Since it is possible to cite only publications that were written in the past, this network typically does not contain cycles. The second network, author citation network, connects authors. (i.e. Author_1 cites Author_2 in Pub_3). This can be represented as an oriented, labelled graph with authors as vertices and publications as labels of edges. As opposed to the simple citation network, the author citation network can contain cycles often representing discourses in research communities and thus can be viewed as a social network similar to Facebook or Twitter.
Analysis of research and citation networks
The aim of this task is to analyse all types of networks and compare their structures against each other using social network analysis methods. This task will require us to measure parameters, such as betweenness, centrality, closeness, cohesion and structural equivalence for different parts of the network. In particular, it is interesting to compare the citation and author citation networks against the thematic/discipline clusters. We expect that the structure of the citation network depends on topics and disciplines (Shi et al., 2010). From this structure it would be possible to appreciate and evaluate the impact of individual papers on the development of a new discipline. The author citation network will allow us to associate a community of authors with topics/disciplines and to identify the key roles in the discipline’s development.
Publishing DiggiCORE results
The information about the constructed networks will be published as Linked Open Data. On top of that, DiggiCORE will develop a set of web services that will provide free access to the very large collection of harvested Open Access publications. The services will support both efficient search and navigation of the integrated repository of Open Access publications (i.e. accessing and downloading full-text content and metadata, retrieving information about related papers, disciplines etc.) as well as efficient and easy-to-use methods for analysing the constructed networks. The resulting infrastructure can thus be reused in flexible ways for a number of purposes. Examples of the target audience are digital libraries and repositories that might integrate the service into their specific systems, or social and computer science researchers who will use it to get access to the large network of publications to carry out their research. To demonstrate the usability of the service, DiggiCORE will develop three applications that will
interact with the service. (1) the DiggiCORE Portal – a web portal that will enable a user to interact with the system using a web interface. (2) DiggiCORE Client – An example plug-in on how to integrate the DiggiCORE service into a specific digital library. (3) DiggiCORE Mobile – an application for searching and navigating the Open Access content on mobile devices including SmartPhones and tablets.
Exploitation and Dissemination
The exploitation and dissemination strategy will be based on the experience of The European Library. A promotional video will illustrate integration with existing applications. The project will organise two web conferences to present results. The results will be published in at least three conferences and one paper will be submitted to a scientific journal.
The expected duration of work packages and tasks is shown in the Gantt chart below.

Figure 3. DiggiCORE Gantt chart
Standards used in DiggiCORE
To provide flexibility in reusing the developed software infrastructure, DiggiCORE will adopt a service-oriented approach towards the technology solution. The services will be communicating in a 10 standard manner by supporting the information exchange using XML-RPC, SOAP or REST protocol. The metadata will be made available in a standard RDF format. A suitable format, such as OWL or RDF(S) will be used for the description of the newly-acquired metadata that will originate from the text analysis. The software tools will be designed with the help of the Unified Modelling Language and will adhere to a set of open standards that will ensure compatibility with other applications in the long term. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) will be used for collecting the metadata from distributed OSO repositories.
The developed software infrastructure will be free to use for public. The newly release metadata collection will be released as Linked Data, allowing its reuse by third parties for free. The project partners will also investigate the possibility of releasing the software as Open Source.
Training of Graduate Students and Novice Researchers
The DiggiCORE research will be carried out by experienced teams, both at the Open University and at The European Library. We also expect junior researchers, PhD students or interns to participate in selected research activities. In the budget, we have earmarked the corresponding funding. The involvement of students and young researchers should prove to be very useful both for the project and for the participants. In the list of publications describing previous projects similar to DiggiCORE, approximately 30% of authors are young researchers or graduate students.
Related work
Library federations and harvesting
Open Source harvesting tools, such as MOAI (MOAI, 2008) and D-NET (Manghi et.al, 2010) aim to deliver a software solution for metadata harvesting that can be easily integrated into complex and specialised library systems. Another system, OJAX (Wusteman, 2009) is concerned with the development of a user interface for a federation of OAI-PMH repositories. On the other hand, webportals, such as BASE (Pieper, 2006) and WorldCat (the OAIster database (Hagedorn,2003)) provide metasearch of a collection of more than 25 million records directly to end users. In addition, BASE allows the use of a programmable API for the searching of their collection. The last group of systems consists of complex publishing platforms that, among other functionalities, are capable of exposing library content using OAI-PMH.
Although there are a number of tools for metadata harvesting, few solutions perform full-text downloading and are completely reliant on the provided metadata. As a result, their functionalities and potential for content analysis are limited.
Linking scholarly information using Semantic Data
The role of metadata in Digital Libraries (DL) is fundamental for their operation since the emergence of the domain, approximately two decades ago. Yet the notion of metadata has been extended significantly, from the initial formal bibliographic information describing an artefact, to more descriptive data including user-contributed or social data, activity data, automatically-extracted information, inferred knowledge, aggregation and composite information. The need for the formalization of the structure and the semantics of these elements has been acknowledged through well-established taskforces, such as the Dublin Core Metadata Initiative and the Open Geospatial
Consortium. However, the formalization work for representing specific metadata information is still ongoing. A popular method is to describe the metadata using an ontology and to publish the artefact information using Linked Open Data. This can be considered as a bottom-up approach which focuses on quickly exposing available data. The interoperability between systems can then be achieved by the alignment of their underlying ontologies at any later stage.
However, the Semantic Web or Open Linked Data are not just about making information available in an interoperable format, but also about connecting and linking relevant resources so that the web can be explored better, both by people and by machines. Over the last decade, there has been a significant interest in tools that can be used to express and represent information about relationships between semantic data resources, including research papers. For example, the Mendeley tool (Henning, 2008) allows users to annotate relationships to other articles and to share this metadata with others. Similar work has been done previously by (Uren, 2003), the ClaiMaker offers a network model for summarizing research debates over a whole set of literature as well as for individual documents. However, these tools depend on manual annotation, which does not scale up for datasets of even moderate size (Knoth et. al., 2009). Unlike these methods, DiggiCORE will use text processing techniques to automatically discover related information. To allow system interoperability, DiggiCORE will expresses the discovered relations as Linked Data.
Social network analysis
Social networks have been used to analyse the interactions between people or groups for many decades. The networks are usually represented as a graph with vertices describing social entities and edges characterising relationships between entities. The classical monograph dedicated to the analysis of social networks is (Wasserman and Faust, 1994). Properties of the network are studied by investigating different graph structures, such as cycles, paths etc. The graph structures have been used to characterise relations between publications. For example, Klink et al (2006) constructs and analyses different network types from relations such as being a co-author, being a co-author of a coauthor, publishing on similar conferences or journals. Enkhsaikhan et al (2008) construct the network of publishing authors, disambiguate their identity and focus their analysis on spatial and temporal criteria. Shi et al (2010) investigate the impact of publication and compare the results for different disciplines by analysing the structure of citation graph. All these approaches analyse various syntactic properties of the graph. The aim of DiggiCORE is to enrich the analysis by considering also
semantics of network relations.
Risk Management and RiskSmitigation
Risk 1: Calculation of semantic relations, e.g. similarities, is computationally too expensive for expected large collections of data.
Probability: Low
Risk mitigation plan: The selection of document pairs will be reduced by suitable heuristics, e.g. using metadata retrieved for both documents. As a result, the similarity matrix should be blockdiagonal with the size of matrices on the diagonal sufficiently reduced.
Risk 2: A non-standard use of the OAI-PMH protocol in the Open Access repositories
Probability: Medium. Out of the 70% repositories that subscribe to the OAI-PMH protocol a modest percentage use various non-standard modifications.
Risk mitigation plan: The non-standard use is often applied systematically to the whole repository and therefore a single solution applies to all documents in the repository. A number of cases have already been resolved in previous projects.
Risk 3: The name of authors is difficult to disambiguate, e.g. J.Smith the initial “J” may stand for many first names. Similar problem occurs if two or more individuals exist with the same name and surname.
Probability: Low
Risk mitigation plan: A number of heuristics will be used to disambiguate names. They include author’s affiliation, discipline, co-occurrence of the same co-authors in other papers, the date of publication and others. In the unlikely case that all heuristics fail, the problem will be recorded in an error log, and excluded from the author’s network until resolved by the system administrator.
Risk 4: Full text of document is in a non-standard, machine readable PDF format.
Probability: Very low. This has been observed in very few cases.
Risk mitigation plan: These cases will be excluded from further processing.
Compliance with Ethical Principles
The project does not raise any ethical questions.
References
- [Enkhsaikhan et al, M. 2008] Enkhsaikhan M, Liu, W., Reynolds 'Geographical and Temporal Visualisation of Social Relationships', 12th Pacific Asia Conference on Information Systems (PACIS 2008), Hong Kong, 1, pp. 433-443.
- [Garfield, 2006] Eugene Garfield. The history and meaning of the journal impact factor. Journal of the American Medical Association, 2006.
- [Hagedorn, 2006] Kat Hagedorn. Oaister: a no dead ends oai service provider. Library Hi Tech, 21(2):170–181, 2003.
- [Henning et. al., 2008] Victor Henning and Jan Reichelt. Mendeley - A Last.fm For Research. In 2008 IEEE Fourth International Conference on eScience, pages 327–328, 2008.
- [Klink, 2006] Klink, S., Reuther, P., Weber, A., Walter, B. and Ley, M. Analysing social networks within bibliographical data. In DEXA 2006, LNCS 4080, 2006, pages 234-243. Springer-Verlag.
- [Knoth et. al., 2009] Petr Knoth, Jakub Novotny, and Zdenek Zdrahal. Semantic annotation of multilingual learning objects based on a domain ontology. In European Conference in Technology Enhanced Learning (EC-TEL 2009), Nice, France, 2009.
- [Knoth et. al., 2010] Petr Knoth, Jakub Novotny, and Zdenek Zdrahal. Automatic generation of inter- passage links based on semantic similarity. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 590–598, Beijing, China, August 2010.
- [MOAI, 2008] Developers. Moai, an open access server platform for institutional repositories, 2008. http://moai.infrae.com/.
- [Manghi et. al., 2010] Paolo Manghi, Marko Mikulicic, Leonardo Candela, Michele Artini, and Alessia Bardi. General-purpose digital library content laboratory systems. In Mounia Lalmas, Joemon Jose, Andreas Rauber, Fabrizio Sebastiani, and Ingo Frommholz, editors, Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, pages 14–21. Springer Berlin / Heidelberg, 2010.
- [Pieper, 2006] Dirk Pieper and Friedrich Summann. Bielefeld academic search engine (base): an enduser oriented institutional repository search service. Library Hi Tech, 24(4):614 – 619, 2006
- [Shi et. al., 2010] Shi, X., Leskovec, J., & McFarland, D. A. Citing for High Impact.Proceedings of the 10th annual joint conference on Digital libraries JCDL 10, 10. ACM Press. 2010.
- [Urent et. al., 2003] Victoria Uren, Simon Buckingham Shum, Gangmin Li, John Domingue and Enrico Motta. Scholarly publishing and argument in hyperspace. In WWW ’03: Proceed- ings of the 12th international conference on World Wide Web, pages 244–250, New York, NY, USA, 2003. ACM.
- [Wasserman et. al., 1994] Wasserman, S. and Faust, K. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994
- [Wusteman, 2009] Judith Wusteman. Ojax: a case study in agile web 2.0 open source development. ASLIB Proceedings, 61(3):212–231, 2009.

